GUIエージェント(Computer use)

LLMでPCを操作！？Claudeの新機能「computer use」を早速試してみた

https://qiita.com/hedgehog051/items/eddcf26ad5d1dc8086d5

Omniparser

https://huggingface.co/microsoft/OmniParser

OmniParser for pure vision-based GUI agent

https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/

Large Language Model-Brained GUI Agents: A Survey

https://arxiv.org/abs/2411.18279

Agent S: An Open Agentic Framework that Uses Computers Like a Human

https://arxiv.org/abs/2410.08164

https://github.com/simular-ai/Agent-S

OmniParser for Pure Vision Based GUI Agent

https://arxiv.org/abs/2408.00203

OS-Atlas: A Foundation Action Model For Generalist GUI Agents

https://github.com/OS-Copilot/OS-Atlas

https://arxiv.org/abs/2410.23218

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

https://arxiv.org/abs/2404.05719

LLM でブラウザを操作する WEB エージェントと周辺技術のざっくり紹介

https://tech.algomatic.jp/entry/survey/agent/web-navigation

BrowserGym

https://github.com/ServiceNow/BrowserGym

browsergym leader board

https://huggingface.co/spaces/ServiceNow/browsergym-leaderboard

これはもう実質AGIでは? AIが勝手にブラウザを操作していろいろやってくれちゃう BrowserUseが爆誕

https://note.com/shi3zblog/n/n960fc72b36e9?sub_rt=share_b

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

https://arxiv.org/abs/2412.17589

Python と Playwright でブラウザを自動操作させるコードを自動生成したよ

https://qiita.com/mainy/items/3a9de19f440991f67f34

browser-useの改良とAIエージェントとの繋ぎこみ

https://github.com/browser-use/web-ui

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

https://arxiv.org/abs/2412.17589

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

https://arxiv.org/abs/2501.04575

operatorの概要

https://note.com/npaka/n/nc05d63fde4bf?rt=email&sub_rt=daily_report_followee_notes

Browser UseのWeb UIを使いながらAIエージェントの業務システムへの適用を考える

https://dev.classmethod.jp/articles/browser-use-web-ui/

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

https://arxiv.org/abs/2501.12326

E2B Desktop Sandbox: GUI操作Agentのための安全な仮想環境

https://www.ai-shift.co.jp/techblog/5515

https://github.com/e2b-dev/desktop/

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

https://arxiv.org/abs/2503.21620

Computer Use〜OpenAIとAnthropicの比較と将来の展望

https://studyco.connpass.com/event/350551/presentation/?utm_campaign=new_event_links_to_group_member&utm_source=notifications&utm_medium=email&utm_content=detail_btn

Computer-Using Agent向け日本語VLM「KARAKURI VL」を試す

https://zenn.dev/kun432/scraps/896f0ef6490adf

GTA1: GUI Test-time Scaling Agent

https://arxiv.org/pdf/2507.05791